Basic usage is following step;
ReplayBuffer.__init__
)ReplayBuffer.add
)
ReplayBuffer.on_episode_end
)ReplayBuffer.sample
)Here is a simple example for storing standard environment (aka. obs
, act
, rew
, next_obs
, and done
).
from cpprb import ReplayBuffer
buffer_size = 256
obs_shape = 3
act_dim = 1
rb = ReplayBuffer(buffer_size,
env_dict ={"obs": {"shape": obs_shape},
"act": {"shape": act_dim},
"rew": {},
"next_obs": {"shape": obs_shape},
"done": {}})
obs = np.ones(shape=(obs_shape))
act = np.ones(shape=(act_dim))
rew = 0
next_obs = np.ones(shape=(obs_shape))
done = 0
for i in range(500):
rb.add(obs=obs,act=act,rew=rew,next_obs=next_obs,done=done)
if done:
# Together with resetting environment, call ReplayBuffer.on_episode_end()
rb.on_episode_end()
batch_size = 32
sample = rb.sample(batch_size)
# sample is a dictionary whose keys are 'obs', 'act', 'rew', 'next_obs', and 'done'
(See also API reference)
Name | Type | Optional | Discription |
---|---|---|---|
size |
int |
No | Buffer size |
env_dict |
dict |
Yes (but unusable) | Environment definition (See here) |
next_of |
str or array-like of str |
Yes | Memory compression (See here) |
stack_compress |
str or array-like of str |
Yes | Memory compression (See here) |
default_dtype |
numpy.dtype |
Yes | Fall back data type |
Nstep |
dict |
Yes | Nstep configuration (See here) |
mmap_prefix |
str |
Yes | mmap file prefix (See here) |
Flexible environment values are defined by env_dict
when buffer creation. The detail is described at document.
Since stored values have flexible name, you have to pass to ReplayBuffer.add
member by keyword.